Multilingual extraction and editing of concept strings for the legal domain

نویسندگان

  • Andrea Varga
  • Andrew N. Edmonds
چکیده

Identifying semantic expressions (so-called concept strings (CSs)) in multilingual corpora is an important NLP task, as it allows web search engines to define and perform semantic queries over large collection of documents. Existing web search engines in the legal domain are mainly limited to keyword search, in which the query word is matched against the textual content of the documents. This paper presents a novel framework named the Concept Strings Framework that makes use of CSs for representing the content of the documents, and for allowing semantic search over them. These CSs can consist of individual knowledge base (KB) concepts (e.g. WordNet concepts) or combination of them. In addition, this paper presents an interactive web-based toolkit, called the Template Editor that enables the creation, editing and evaluation of CSs. Experiments on two publicly available legislation websites show satisfactory

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cultural Frame and Translation of Pronominal Adverbs in Legal English

This paper explores the relationship between cultural knowledge and the specific meaning of a pronominal adverb in legal English where Chinese translators need to get the correct translation in their venture into translating the language of law. On the one hand, relying on the relevant legal cultural knowledge functioning as domain-general reference within a community or jurisdiction, tra...

متن کامل

Applications of a Concept Mapping Tool

In this paper a general purpose tool for editing Concept Maps (CM-ED) is presented. This tool exhibits the following features: templates, views and facilities for multilingual concept maps. The tool has been used in different tasks of the Computer Aided Teaching/Learning area. Concretely: domain representation, exercise design and student model visualization.

متن کامل

Extraction of Multilingual Term Variants in the Business Reporting Domain

Within the context of the European research project ”Monnet”, which implements among other activities ontology-based multilingual information extraction, we tackle the the issue of recognizing variants of concept labels in business reports that guide the information extraction process. In this short paper, we describe two related experiments in finding variants of multilingual taxonomy labels u...

متن کامل

Automatic Acquisition of Semantics-Extraction Patterns

This paper examines the use of parallel and comparable corpora for automatic acquisition of semantics-extraction patterns. It presents a new method of the pattern extraction which takes advantage of parallel texts to “port” text mining solutions from a source language to a target language. It is shown that the technique can help in situations when the extraction procedure is to be applied in a ...

متن کامل

Value Added Tagging for Multilingual Resource Management

The Legebiduna project brings together state-ofthe-art techniques in multilingual corpus management, generic mark-up, text segmentation and alignment, terminological extraction, automatic text cataloguing, and reutilisation of recurrent text in specialised documentation. We report on the experience of a four year project of bilingual corpus mining in a dedicated domain of official bilingual pub...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016